A Cholinergic Feedback Circuit to Regulate Striatal Population Uncertainty
نویسندگان
چکیده
14 Convergent evidence suggeststhat the basal ganglia support reinforcement learning by adjusting 15 action values according to reward prediction errors. However, adaptive behavior in 16 stochasticenvironments requires the consideration of uncertainty to dynamically adjust the 17 learning rate. We consider how cholinergic tonically active interneurons (TANs) may endow the 18 striatum with such a mechanismin computational models spanning three Marr’s levels of 19 analysis. In the neural model, TANs modulate the excitability of spiny neurons, theirpopulation 20 response to reinforcement, and hence the effective learning rate. Long TAN pauses facilitated 21 robustness to spuriousoutcomes by increasing divergence in synaptic weights between neurons 22 coding for alternative action values,whereas short TAN pauses facilitated stochastic behavior but 23 increased responsiveness to change-points in outcome contingencies.A feedback control system 24 allowed TAN pauses to be dynamically modulated by uncertainty across the spiny neuron 25 population,allowing the system to self-tune and optimizeperformance across stochastic 26 environments. 27 28 INTRODUCTION: 29 When tasked with taking an action in an unknown environment, there can be considerable 30 uncertainty about which actions will lead to the best outcomes. A principledway to resolve this 31 uncertainty is to use previous experience to guide behavior towards actions that have led to 32 positive outcomes in the past and away from actions that have led to negative outcomes. 33 Convergent evidence suggests that the basal ganglia can guide behavior by incorporating positive 34 and negative feedback in areinforcement learning process (O’Doherty et al., 2003; Barnes et al., 35 2005; Frank, 2005). However, learning can be complicated in a changing environment, as the 36 A cholinergic feedback circuit to regulate striatal population uncertainty 3 validity of past experiences and the relationship between actions and outcomes become uncertain 37 as well. Mathematical models suggest that it is optimal to take uncertainty into account in 38 learning and decision making (Yu and Dayan, 2005; Behrens et al., 2007; Mathys et al., 2011), 39 but it is unclear whether the basal ganglia can directly consider uncertainty in feedback based 40 learning. 41 Basal ganglia dependent learning is often described within the normative framework of 42 reinforcement learning (RL) following the observation that signaling from dopaminergic 43 afferents matches the pattern of a reward prediction error (RPE)(Montague et al., 1996; Bayer et 44 al., 2007). An RPE is the signed difference between the observed and expected outcomes and is 45 often used in RL to generate point-estimates of action-values (Sutton and Barto, 1998). Phasic 46 dopamine is thought to provide an RPE signal to striatal medium spiny neurons (MSNs) and 47 induce learning through changes in corticostriatal plasticity(Montague et al., 1996; Reynolds and 48 Wickens, 2002; Calabresi et al., 2007), with opponent learning signals in the direct and indirect 49 pathways (Frank, 2005; Collins and Frank, 2014). Within these pathways, separate populations 50 code for the (positive and negative) values of distinct action plans(Samejima et al., 2005). 51 Multiple lines of evidence in humans and animals support this model, including 52 optogeneticmanipulations(Tsai et al., 2009; Kravitz et al., 2012), synaptic plasticity studies(Shen 53 et al., 2008),functional imaging (McClure et al., 2003; O’Doherty et al., 2003), genetics and 54 pharmacology in combination with imaging (Pessiglione et al., 2006; Frank et al., 2009; Jocham 55 et al., 2011) and evidence frommedication manipulations in Parkinson’s patients (Frank et al., 56 2004). 57 Despite the substantial empirical supportfor RPE signals conveyed by dopamine, the 58 simple RL mechanisms used to model the basal ganglia are inflexible in the degree to which they 59 A cholinergic feedback circuit to regulate striatal population uncertainty 4 learn in a changing environment. RL models typically adopta fixed learning rate, such that every 60 RPE of similar magnitude equally drives learning. However, a more adaptive strategy in a 61 changing environment is to adjust learning rates as a function ofuncertainty, so that unexpected 62 outcomes have greater influence when one is more uncertain of which action to take (e.g., 63 initially before contingencies are well known, or following a change-point), but less influence 64 once the contingencies appear stable and the task is well-known(Yu and Dayan, 2005; Behrens et 65 al., 2007; Nassar et al., 2010; Mathys et al., 2011; Payzan-LeNestour and Bossaerts 2011). This 66 Bayesian perspective presents the additional challenge for basal ganglia dependent learning: in 67 order to take advantage of its own uncertainty over action selection, the basal gangliawould need 68 a mechanism to translate its uncertainty into a learning rate. 69 Cholinergic signaling within the striatum offers a potential solution to this challenge. With 70 few exceptions(Tan and Bullock, 2008; Ashby and Crossley, 2011), models of the basal ganglia 71 typically do not incorporate striatal acetylcholine. Within the striatum, cholinergic interneurons 72 are thepredominant source of acetylcholine(Woolf and Butcher, 1981). These interneurons, also 73 known as tonically active neurons (TANs) due to their intrinsic 2-10hz firing pattern, are giant, 74 spanning large areas of the striatum with dense axonal arborization and broad synaptic input 75 (Goldberg and Reynolds, 2011).TANs appear to be necessary tolearning only when flexibility is 76 required(Ragozzino et al., 2009; Bradfield et al., 2013),suggesting that they might modulate the 77 learning rate as a function of changes in outcome statistics (i.e., uncertainty). Similar to 78 dopaminergic neurons, TANs show sensitivity to rewarding events and develop a learned phasic 79 response to predictive cues(Aosaki et al., 1994; Morris et al., 2004). This response consists of a 80 phasic burst in TAN activity followed by a pause that lasts a few hundred milliseconds (Aosaki 81 et al., 1995).While the temporal pattern of the burst-pauseresponse is temporally concomitant to 82 A cholinergic feedback circuit to regulate striatal population uncertainty 5 the dopamine response(Morris et al., 2004), the unidirectional TAN response is not consistent 83 with a bivalent RPE(Joshua et al., 2008) but instead is thought to providea permissive signal for 84 dopaminergic plasticity(Graybiel et al., 1994; Morris et al., 2004; Cragg, 2006). 85 But how would such a permissive signal be modulated by the network’s own 86 uncertaintyabout which action to select? Because TANs receive broad inhibitory synaptic input 87 from local sources including MSNs and GABAergic interneurons(Bolam et al., 1986; Chuhma et 88 al., 2011),we hypothesized that the pausewould be modulated by a global measure of 89 uncertaintyacross the population of spiny neurons. Given that MSN sub-populations code for 90 distinct action values (Samejima et al., 2005; Frank 2005; Lau &Glimcher, 2008), co-activation 91 of multiple populations can signal enhanced uncertainty over action selection, which would 92 translate into greater inhibition onto TANs.The synchrony in the TAN response suggests a global 93 signal(Graybiel et al., 1994), which can then be modulated by inhibitory MSN collaterals across 94 a large range of spiny inputs. The TAN pause responseis consistent with a signal of uncertainty 95 that adjusts learning. First, it increases with the unpredictability of a stochastic outcome 96 (Apicella et al., 2009, 2011). Second, pharmacological blockade or lesioning excitatory input to 97 TANsimpairs learning specifically after a change in outcome contingencies(Ragozzino et al., 98 2002; Bradfield et al., 2013). For an optimal learner, both increases in stochasticity andchangesin 99 outcome contingencies results in an increase in uncertainty (Yu and Dayan, 2005; Nassar et al., 10
منابع مشابه
A cholinergic feedback circuit to regulate striatal population uncertainty and optimize reinforcement learning
Convergent evidence suggests that the basal ganglia support reinforcement learning by adjusting action values according to reward prediction errors. However, adaptive behavior in stochastic environments requires the consideration of uncertainty to dynamically adjust the learning rate. We consider how cholinergic tonically active interneurons (TANs) may endow the striatum with such a mechanism i...
متن کاملElimination of the Vesicular Acetylcholine Transporter in the Striatum Reveals Regulation of Behaviour by Cholinergic-Glutamatergic Co-Transmission
Cholinergic neurons in the striatum are thought to play major regulatory functions in motor behaviour and reward. These neurons express two vesicular transporters that can load either acetylcholine or glutamate into synaptic vesicles. Consequently cholinergic neurons can release both neurotransmitters, making it difficult to discern their individual contributions for the regulation of striatal ...
متن کاملSubstantia nigra D1 receptors and stimulation of striatal cholinergic interneurons by dopamine: a proposed circuit mechanism.
Dopamine release can regulate striatal acetylcholine efflux in vivo through at least two receptor mechanisms: (1) direct inhibition by dopamine D2 receptors on the cholinergic neurons, and (2) excitation initiated by dopamine D1 receptors. The neuroanatomical locus of the latter population of D1 receptors and the pathway(s) involved in the expression of their influence are controversial issues....
متن کاملA local circuit model of learned striatal and dopamine cell responses under probabilistic schedules of reward.
Recently, dopamine (DA) neurons of the substantia nigra pars compacta (SNc) were found to exhibit sustained responses related to reward uncertainty, in addition to the phasic responses related to reward-prediction errors (RPEs). Thus, cue-dependent anticipations of the timing, magnitude, and uncertainty of rewards are learned and reflected in components of DA signals. Here we simulate a local c...
متن کاملA study on striatal local electrical potential changes in an animal model of Parkinson's disease
Parkinson’s disease (PD) is a neurodegenerative disorder that does not develop spontaneously in some animal species. PD can be induced experimentally in some laboratory animals including mouse, rat and horse. Globus pallidus (GP) and substantia nigra pars compacta (SNc) are damaged in patients with PD. The hallmark of PD is a progressive impaired control of movement, an alteration of autonomic ...
متن کامل